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ABSTRACT 

The choice of a cutting score for criteri on“related 
tests influences decisions related to classifying people into 
dichotomous categories. This paper proposes an empirical methodology 
for determining the best cutting score when there is information 
about the test score frequency distribution of test“takers defined as 
actually successful and actually unsuccessful on some criterion. The 
method is based on two statistics calculated for each possible 
cutting score. The first is a pure hit rate, representing the 
proportion of correct classifications above those expected by chance . 
Second is a chi“square statistic for testing the significance of the 
difference between the population frequencies of the two types of 
misclassifications errors. A cutting score summary table is devel oped 
based on the information about the test score frequency distributions 
of two validation samples based on actually successful and actually 
unsuccessful samples. Cutting scores are divided into those that 
yield equal frequencies of the two types of mi s cl ass i f i ca t i on errors 
and those in which the frequency of one type of error is higher than 
that of the other. The cutting score summary table facilitates the 
determination of the best cutting score in each category. (Contains 2 
tables and 19 references.) (SLD) 
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ON THE CUTTING SCORE DETERMINATION 
IN DICHOTOMOUS CLASSIFICATIONS 



INTRODUCTION 

The choice of a cutting score for criterion-related tests influences decisions related to classify- 
ing people into dichotomous categories - for example, decisions based on tests for admitting stu- 
dents to a college, hiring job applicants, prescription of preventive psychopathological therapy, 
etc. In g - .leral, choosing an appropriate cutting score is an essential issue in setting standards 
on educational, psychological, and occupational tests which explains the considerable amount of 
publications related to this topic. All major works on determining optimal cutting scores focus on 
the estimation of two possible types of misclassifications errors by using different models for the 
true score distribution - mainly Bayesian models and binomial models (e.g. Hambleton & Novick, 
1973; Klein & Cleary, 1967; Huynh, 1976; Lord & Stocking, 1976; Wilcox, 1977). The sum of 
the estimated misclassification error probabilities, multiplied by judgmentally specified misclassi- 
fication "losses", define the expected loss and, most frequently, the test score that minimizes the 
expected loss is taken as the best cutting score. However, factors like need for testing model 
assumptions, judgmental nature of the misclassification losses, and relatively difiBcult calcula- 
tion of the expected losses still keep the door open for a search of technically simple procedures 
which do not include assumptions about the true score distribution. 

In an attempt to make a step in this direction, the present paper proposes an empirical methodol- 
ogy for determining the best cutting score when there is an information about the test score fre- 
quency distribution of test-takers defined as actually successful and actually unsuccessful on 
some criterion (educational, clinical, professional, etc.). 

METHOD 

The approach proposed here is methodologically based on the following two statistics calcu- 
lated for each possible cutting score: 

1) A "pure hit rate", PHR, representing the proportion of correct classifications above the ex- 
pected by chance. 

2) A - statistic for testing the significance of the difference between the population frequencies 
of the two types of misclassifications errors. 



Table 1 represents the general form for the two-way classification frequency distribution 
yielded by a given cutting score. Cell A is the frequency of correct classifications of the type 
"predicted successful - actually successful" (PS-AS), and cell D is the frequency of correct 
classifications of the type "predicted unsuccessful - actually unsuccessful" (PU-AU). Cell B is 
the frequency of misclassiffications of the type "predicted unsuccessful - actually successful" 
(PU-AS), and cell C is the frequency of the other type misclassifications: "predicted successful- 
actually unsuccessful" (PS-AU). The proportion of the correct classification is called hit rate: 



HR = 



A4-D 
N 5 







( 1 ) 



where N = A + B + C + D. 
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Table 1 




Predicted 


classification 








Successful 


Unsuccessful 


Total 


Actual 


Successful 


A 


B 


A-i-B 


classification 


Unsuccessful 


C 


D 


C + D 




Total 


A + C 


B+D 


N 



The hit rate, as calculated by (1), is taken into account in many empirical approaches for deter- 
mining optimal cutting scores (e.g. Berk, 1976 ; Allen & Yen, 1979, p. 104). However, its 
value includes a proportion of correct classifications that may occur by chance. In order to avoid 
this problem and increase the reliability of the cutting score determination, we propose the use of 
the Cohen’s kappa, k, (see Cohen, 1960). In the context of the present study, k will represent 
the proportion of correct classifications, PS-AS and PU-AU, above that expected by chance, i.e. 
the "pure hit rate", PHR, and is calculated by the respective formula: 

(2) PHR=^ 

where: HR is the hit rate calculated by (1) ; 

Pc is the proportion of correct classification expected to occur by chance and, in terms 
of the cell frequencies in Table 1, is: Pc = <‘*‘^°XA+cyN^ctPXBtpyN 

The question about the equality of the misclassifications in both "directions" PS-AU and PU-AS, 
is related to testing the following null hypothesis: For a given sample of test-takers, the en- 
tries B and C in Table 2 differ only as a result of chance sampling. If this is true, the ex- 
pected number of PS-AU misclassifications equals the expected number of PU-AS misclassi- 
fications and is given by the average of C and B, i.e. (B + C)/2. Hence, the null hypothesis can 
be tested by the use of a -statistic, which is the sum of the squared differences between the ob- 
served and expected frequencies, each divided by the expected frequency: 

2 _ (b-^)' (B-C)2 

^ E B+C B+C B+C • 

2 2 

Hence, the calculation of the x^- statistic for testing the significance of the difference between the 
population frequencies of the two types of misclassification errors is given by the formula: 



O 
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( 3 ) = 

This is, in fact, an application of the McNemar test for significance of changes for the situation 
represented by Table 1. In this case, 2x2 tables, the degrees of freedom are df = 1 which makes 
the use of suspicious when the expected frequencies ^ are less than 5. The Yates’ correc- 
tion for continuity leads to the following corrected form: 

(4) X" = ; (see McNemar, 1969, pp. 260-263). 

For example, if a given cutting score leads to the following cell fi-equencies in Table 1: A=45, 
B=15, C=10, and D=30, by using formula (3), we calculate: x^ - = 100. This 

number is less than the critical value, x^ = 3.841, at level of significance a = .05 and degrees of 
freedom df = 1 . Hence, in this case, the cutting score yields equal population fi-equencies of the 
two types of misclassifications errors, PS-AU and PU-AS. On the other hand, the pure hit rate 
yielded by this hypothetical cutting score will be PHR = .49, after applying formulas (1) and (2) 
for the calculation of the hit rate, HR, and the pure hit rate, PHR, respectively. 

Thus, the x^ statistic and the PHR index answer two very imponant questions related to each 
possible cutting score: 

1) Does the cutting score yield equally serious misclassifi cation errors, PS-AU and PU-AS ? 

2) What is the proportion of correct classifications above that expected by chance? 

Cutting score Summary Table (CTS) 

Proposed here is a table that summarizes the x^ -values, the PHR values, and the cell frequen- 
cies A, B, C, and D from Table 1 yielded by each possible cutting score. This table, called "Cut- 
ting score Summary Table" (CSX), is based on the information about the test score frequency 
distributions of two validation samples of people defined as actually successful and actually un- 
successful . Table 2 represents a CTS for hypothetical data including a test scale, given in col- 
umn Cl, and the frequencies over this scale of actually successful (AS) and actually unsuccessful 
(AU) test-takers, given in columns C2 and C3, respectively. The calculation of the numbers in 
columns C4, C5, CIO is straightforward: 

C4 = the number of correct PS-AS classifications (cell A in Table 1), obtained as cumulative 
frequencies from column C2; 

C5 = the number of PS-AU misclassifications (cell C in Table 1), obtained as cumulative 
frequencies from column C3; 

C6 = the number of PU-AS misclassifications (cell B in Table 1), obtained as N^- A, i.e. by 
subtracting the column C4 numbers from the total number of successful people, Ng ; 




0 
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C7 = the number of correct PU-AU classifications (cell D in Table 1), obtained as - C, i.e. 

by subtracting the column C5 numbers from the total number of unsuccessful people, ; 
C8 = the hit rate, HR, calculated by formula (1); 

C9 = the pure hit rate, PHR, calculated by formula (2); 

CIO = the statistic, calculated by formula (3). 



Table 2 



Cl 


C2 


C3 


C4 


C5 


C6 


C7 


C8 


C9 


CIO 


Test 

score 


Actually 

Success. 


Actually 

Unsucc. 


PS-AS 

(A) 


PS-AU 

(C) 


PU-AS 

(B) 


PU-AU 

(D) 


HR 


PHR 


Chi-sq.. 


9 


9 


0 


9 


0 


75 


46 


.423 


.078 


75.00 


8 


18 


2 


27 


2 


57 


44 


.546 


.218 


51.27 


7 


26 


7 


53 


9 


31 


37 


.692 


.393 


12. ’u 


6 


11 


10 


64 


19 


20 


27 


.700 


.347 


0.02 * * 


5 


10 


8 


74 


27 


10 


19 


.715 


.321 


7.81 


4 


4 


4 


78 


31 


6 


15 


.715 


.290 


16.89 


3 


2 


7 


80 


38 


4 


8 


.677 


.152 


27.52 


2 


4 


6 


84 


44 


0 


2 


.661 


.055 


44.00 


0 or 1 


0 


2 


84 


46 


0 


0 


.646 


.000 


46.00 



The** in column CIO indicates a x^- statistic which is less than the critical x^ = 3.841, at the 
a = .05 level of significance. The respective cutting score yields equally serious misclassification 
errors, PS-AU and PU-AS, in the sense that it yields equal fi'equencies of the two types of errors 
over the entire population of test-takers. 

As one can see from Table 2, the test score of 6, if taken as a cutting score, is the only one that 
yields equally serious misclassification errors, PS-AU and PU-AS, because its x^ -statistic (=.02) 
is the only one which is less than the critical x^ = 3.841 (with a = .05 and df= 1). Hence, under 
the assumption of equally serious misclassification errors, we can choose the cutting score of 6. 
One can also see that all cutting score above the cutting score of 6 yield higher fi-equency of the 
PU-AS error compared to the frequency of the PS-AU error. Hence, if we prefer more PU-AU 
errors over the entire population of test-takers, we can choose the cutting score of 7 as the best 
cutting score because it yields the highest pure hit rate ( PHR=.393) among all cutting scores 
above the cutting score of 6. Finally, if we prefer more PS-AU errors over the entire population 
of test-takers, we can choose the cutting score of 5 as the best one because it yields the highest 
pure hit rate (PELR=.321) among all cutting scores below the cutting score of 6. 
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CONCLUDING REMARKS 

The method proposed here for determining the best cutting score is based on the idea of the pure 
hit rate ( PHR = proportion of correct classifications above that expected by chance) and on the 
fact that the McNemar test in the context of dichotomous classification tables (see Table 1) 
divides the set of all possible cutting scores into three categories: 

A) Cutting scores that yield equal frequencies of the two types of misclarsification errors, PS-AU 
and PU-AS, over the entire population of test-takers. These cutting scores yield x^ -statistics 
which are less then the critical y} (e.g. x^ = 3.841 at the level a = .05 ). The best cutting score is 
the one that yields the highest pure hit rate among all cutting score in this category, assuming that 
the two types of misclassification errors are equally serious. 

B) Cutting scores for which the fi^equency of the PU-AS error is higher than this of the PS-AU 
error over the entire population of test-takers. These cutting scores yield x^ -statistics which are 
greater than the critical x^ (e g. = 3.841 at the level a = .05) and they are greater then ^e cut- 
ting scores fi-om the above category. A). The best cutting scores yields the highest pure hit rate 
among all cutting scores in this category, B), assuming that the PU-AS errors are less serious 
than the PS-AU errors. 



C) Cutting scores for which the frequency of the PS-AU error is higher than this of the PU-AS 
error over the entire population of tei i -takers. Like the cutting scores in category B), the cutting 
scores in this category also yield x^- statistics greater than the critical x^ (e g- X^ = 3.841 at the 
level a = .05), but they are less than the cutting scores in category A). The best anting score is 
the one that yields the highest pure hit rate among all cutting scores in this category, C), if it is 
assumed that the PS-AU errors are less serious than the PU-AS errors. 

The Cutting score Summary Table (CST), illustrated by Table 2, facilitates the determination of 
the best cutting score in dependence of the category. A), B), or C), refleaing the assumption 
about the seriousness of the misclassification errors. The development of the CTS is straight- 
forward for a simple use of a calculator or some statistical software. For example, the description 
of columns C4, C5, ...» CIO, given in relation to Table 2, is directly interpretable in MINITAB 
commands. This is an important advantage of the method for either real data manipulations or 
computer simulations in the process of determining the best cutting score for the purposes of di- 
chotomous classifications. 
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